Unbiased split selection for classification trees based on the Gini Index

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unbiased split selection for classification trees based on the Gini Index

The Gini gain is one of the most common variable selection criteria in machine learning. We derive the exact distribution of the maximally selected Gini gain in the context of binary classification using continuous predictors by means of a combinatorial approach. This distribution provides a formal support for variable selection bias in favor of variables with a high amount of missing values wh...

متن کامل

Split Selection Methods for Classification Trees

Classification trees based on exhaustive search algorithms tend to be biased towards selecting variables that afford more splits. As a result, such trees should be interpreted with caution. This article presents an algorithm called QUEST that has negligible bias. Its split selection strategy shares similarities with the FACT method, but it yields binary splits and the final tree can be selected...

متن کامل

An (almost) unbiased estimator for the S-Gini index∗

This note provides an unbiased estimator for the absolute S-Gini and an almost unbiased estimator for the relative S-Gini for integer parameter values. Simulations indicate that these estimators perform considerably better then the usual estimators, especially for small sample sizes. 1 The absolute and relative S-gini indices Assume that income is distributed according to a continuous and diffe...

متن کامل

Statistical Sources of Variable Selection Bias in Classification Tree Algorithms Based on the Gini Index

Evidence for variable selection bias in classification tree algorithms based on the Gini Index is reviewed from the literature and embedded into a broader explanatory scheme: Variable selection bias in classification tree algorithms based on the Gini Index can be caused not only by the statistical effect of multiple comparisons, but also by an increasing estimation bias and variance of the spli...

متن کامل

An Improved Text Classification Method Based on Gini Index

In text classification, the purity of the Gini index can be used. When purity value is greater, the characteristic of the information contained in the attribute is higher, and the feature distinguishing capability is stronger. But using the Gini purity formula on feature weight, the classification result is not very good, one of the main reasons is those rare words only appearing in one categor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Computational Statistics & Data Analysis

سال: 2007

ISSN: 0167-9473

DOI: 10.1016/j.csda.2006.12.030